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NONLINEAR MANIFOLD REPRESENTATIONS FOR 
FUNCTIONAL DATA 

By Dong Chen and Hans-Georg Muller 1 

University of California, Davis 

For functional data lying on an unknown nonlinear low-dimen- 
sional space, we study manifold learning and introduce the notions of 
manifold mean, manifold modes of functional variation and of func- 
tional manifold components. These constitute nonlinear representa- 
tions of functional data that complement classical linear represen- 
tations such as eigenfunctions and functional principal components. 
Our manifold learning procedures borrow ideas from existing non- 
linear dimension reduction methods, which we modify to address 
functional data settings. In simulations and applications, we study 
examples of functional data which lie on a manifold and validate the 
superior behavior of manifold mean and functional manifold compo- 
nents over traditional cross-sectional mean and functional principal 
components. We also include consistency proofs for our estimators 
under certain assumptions. 

1. Introduction. Nonlinear dimension reduction methods, such as locally 
linear embedding [28], isometric mapping [31] and Laplacian eigenmaps [2], 
have been successfully applied to image data in recent years. A commonly 
used example is the analysis of photos of a sculpture face taken under dif- 
ferent angles and lighting conditions. The number of pixels of these images 
is huge, but their structure only depends on a few variables related to angle 
and lighting conditions. It is then advantageous to treat the observed image 
data as a manifold that is approximately isomorphic to a low-dimensional 
Euclidean space. 

Unlike shape analysis [21] and the recent diffusion tensor imaging [17], 
where it is assumed that the form of the manifold is known a priori, nonlin- 
ear dimension reduction methods usually are manifold-learning procedures, 
where the manifold is not known but it is assumed that it possesses certain 



Received June 2011; revised September 2011. 

Supported in part by NSF Grants DMS-08-06199 and DMS-11-04426. 
AMS 2000 subject classifications. 62H25, 62M09. 

Key words and phrases. Functional data analysis, modes of functional variation, func- 
tional manifold components, dimension reduction, smoothing. 

This is an electronic reprint of the original article published by the 

Institute of Mathematical Statistics in The Annals of Statistics, 

2012, Vol. 40, No. 1, 1-29. This reprint differs from the original in pagination and 

typographic detail. 



1 



2 



D. CHEN AND H.-G. MULLER 



features which are preserved in the observed data. For instance, locally lin- 
ear embedding preserves the manifold local linear structure while isometric 
mapping preserves geodesic distance. Their inherent flexibility predisposes 
these methods for extensions to functional data, where one rarely would have 
prior information available about the nature of the underlying manifold. 

Our goal is to explore manifold representations of functional data. Which 
observed sets of functions are likely to lie on a low-dimensional manifold? 
And how should this be taken into consideration? In contrast to multi- 
variate data, functional data are recorded on a time or location domain, 
and commonly are assumed to consist of sets of smooth random functions. 
Auspicious examples where functional manifold approaches may lead to im- 
proved representations include time-warped functional data [12, 33], density 
functions [23], and functional data with pre-determined and interpretable 
modes [18]. In such situations, the established linear functional approaches, 
such as cross-sectional mean and functional principal component analysis 
(FPCA) often fail to represent the functional data in a parsimonious, efficient 
and interpretable way. Manifold approaches are expected to be especially 
useful to represent functional data inherently lying on a low-dimensional 
nonlinear space. 

In this paper, we develop a framework for modeling I? functions on un- 
known manifolds and propose pertinent notions, such as manifold mean, 
manifold modes of functional variation and functional manifold components, 
as elements of a functional manifold component analysis (FMC A) . Manifold 
means complement notions of a specifically modified functional mean, such 
as the "structural mean" [22]. A major motivation for this proposal is that 
functional principal component plots, for example, displaying second ver- 
sus first component, are quite often found to exhibit "horseshoe" shapes, 
that is, nonlinear dependence in the presence of uncorrelatedness (as princi- 
pal components by definition are always uncorrelated). An example of this 
"horseshoe shape" is provided by the Berkeley growth data (see upper right 
panel of Figure 5). In such situations, one may wish to "unwrap" the "horse- 
shoe" into linear structures by techniques similar to those used in nonlinear 
dimension reduction. When attempting to "unwrap" functional data, one 
encounters specific difficulties: Often the underlying smooth functions are 
not directly observed, but instead need to be inferred from a limited num- 
ber of noise-contaminated measurements that contain the available informa- 
tion for each subject in the sample. To address these problems, we develop 
a modified ISOMAP [31] procedure, by adding a data-adaptive penalty to 
the empirical geodesic distances, and employ local smoothing to recover the 
manifold. 

The paper is organized in the following way. In Section 2, we describe 
what we mean by a functional manifold, manifold mean, manifold modes 
of functional variation and functional manifold components. We develop 
corresponding estimates in Section 3 and discuss their asymptotic properties 
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in Section 4. Sections 5 and 6 are devoted to illustrations of the proposed 
methodology for both simulated and real data. Detailed proofs can be found 
in an online supplement [5]. 

2. Manifolds in function space. 

2.1. Preliminaries. A manifold Ai can be expressed in terms of an atlas 
consisting of a group of charts (U a ,(p a ), where U a are open sets covering Ai 
and (p a , the coordinate maps, map the corresponding U a onto an open subset 
of ]R rf . Additional assumptions on ip a are usually imposed in order to study 
the structure of Ai [8, 16]. 

In this paper, we only consider "simple" functional manifolds Ai in L? 
space, where Ai is isomorphic to a subspace of the Euclidean space, that 
is, the manifold can be represented by a coordinate map cp : Mr — > Ai C L? , 
such that ip is bijective, and both <p, p~ l are continuous, in the sense that 
if 9 n , 6 £ R d and \\8 n - 0|| -> 0, \\<p(0 n ) ~ <p(P)\\i? -> 0; if x n ,x£Ai and 
\\x n — x\\ L 2 — > 0, ||(^~ 1 (x n ) — — > 0. Here, d is the intrinsic dimension 
of the manifold Ai. Such "simple" manifold settings have been commonly 
considered in the dimension reduction literature, for example in [31]. 

For a continuous curve defined on the manifold j: [0, 1] — > Ai, define the 
length operator 



where the supremum is taken over all partitions of the interval [0, 1] with 
arbitrary break points = sq < Si < ■ ■ ■ < s n = 1. We call (p an isometric 
map if £(7) = °7) for any continuous 7, where Liip -1 07) is similarly 

defined as in (2.1) with the L? norm replaced by the Euclidean norm. We 
say Ai is an isometric manifold if there exists an isometric coordinate map (p. 
The isometry assumption is pragmatically desirable and can be found in 
many approaches [9, 31]. Conditions under which isometry holds for image 
data are discussed in [10]. 

We use the notation ip = tp -1 and refer to ip as the representation map. 
The manifold Ai is naturally equipped with the L? distance, which, due to 
the nonlinearity of Ai, is not an adequate metric [31]. More useful is the 
geodesic distance 



where the infimum is taken over all continuous paths 7 on Ai. The geodesic 
distance is the length of the shortest path on Ai connecting the two points, 
and therefore is adapted to Ai. 



n-l 



(2.1) 




(2.2) 



d g (x 1 ,x 2 ) =inf{L(7) 17(0) =£1,7(1) = x 2 } 



2.2. Manifold mean and manifold modes of variation. Suppose Ai is 
a functional manifold of intrinsic dimension d and ip is a representation 



4 



D. CHEN AND H.-G. MULLER 



map for A4. Define, with respect to a probability measure Q in M. d , 

(2.3) ^W)}, v M =r\»), 

where fi is the mean in the (i-dimensional representation space, and fi M is 
the manifold mean in I? space. If Ai is isometric, the manifold mean fi M 
is uniquely defined for all isometric representation maps, as the following 
results shows. 

Proposition 1. Suppose the random function X lies on a functional 
manifold M of intrinsic dimension d and ip is a representation map for M . 
If ijj is isometric, the manifold mean fi M in (2.3) has the following alterna- 
tive expression: 

(2.4) ^■ M = argminE^(x,X), 
where d g denotes the geodesic distance defined in (2.2). 

The expected value in equation (2.4) is with respect to the probability 
measure that is induced by the map ip; see also [3]. Equation (2.4) defines the 
Frechet mean for geodesic distance d g (•,•), and therefore does not depend on 
the choice of the isometric map tjj. The motivation to consider the manifold 
mean is that the traditional cross-sectional mean for functional data in I? 
has significant drawbacks as a measure of location when the data indeed 
lie on a nonlinear functional manifold. Estimates of L 2 means, obtained by 
averaging observed sample curves, can be far away from the data cloud in 
such situations, and therefore do not represent the data in a meaningful way. 
Going beyond the mean, one encounters analogous problems when linearly 
representing such random functions in an L 2 basis, such as the Fourier, B 
spline or eigenfunction basis. 

Consider random functions X E L 2 (T) defined on a bounded domain T ■ 
With fj,(t) = EX(t) and G(t,s) = Cov(X(t),X(s)), according to Mercer's 
theorem [1], if the covariance function G(t,s) is jointly continuous in t, s, 
there is an orthonormal expansion of G(t, s) in terms of the eigenvalues {A^ : 
k > 1} (ordered nonincreasingly) and associated eigenfunctions {(pk '■ k > 1}, 

oo 

(2.5) G(t,s) = Y,^kMt)Ms), t,s€T. 

k=l 

By the Hilbert-Schmidt theorem [14, 27], X can be expressed in terms of 
the so-called Karhunen-Loeve representation, 



k=l 

(2.6) 



X(t)=fi(t)+J2Z k <f> k (t), t€T, 
k=l 

e*= J (x(t)-ii(t))<i> k (t)dt, 
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where the are uncorrelated random variables with mean and variance A&, 
known as functional principal components (FPCs). 

In the manifold case, the FPCs intrinsically lie on a ci-dimensional man- 
ifold. Therefore, we expect that the FPCs do not provide a parsimonious 
representation of X. A better adapted and more compact representation 
can be obtained through nonlinear manifold modes of functional variation 
that are defined below. The established eigenfunction-based modes of func- 
tional variation [4, 19] are 

(2.7) X J , a = f i + a\) /2 (j) J , j = l,2,...,aeR, 

1/2 

where factors Xj standardize the scale for different j and the functional 
variation in the direction of eigenfunction <j)j is visualized by the changing 
of functional shapes as a varies. However, when the functional data lie on 
a manifold, neither \i nor Xj tCl may belong to M, so that these linear modes 
will not provide a sensible description of the variation in the data. 

To address this problem, we define functional manifold component (FMC) 
vectors ej € M. d , j = 1, ...,d, by the eigenvectors of the covariance matrix 
of ip{X) £ R d , that is, 

d 

(2.8) Cov(V(X))=^Afe i e/, 

i=i 

where A^ > ■•■ > A^ are the eigenvalues of Cov('0(^))- The manifold 
modes of functional variation are 

(2.9) X^ = <rV + a(Af ) 1/2 ei), j = 1,. . . ,d,a E.M., 

where [i is the mean in the d-dimensional representation space according to 
measure Q, as given in (2.3). A distinct advantage of manifold-based modes 
of functional variation over the principal component based version (2.7) 
is that in (2.9) only finitely many modes are needed, while (2.7) requires 
potentially infinitely many components. The manifold modes X^ a are unique 
for the case of isometric M, as shown in the following. 

Proposition 2. Suppose ijj and if; are two isometric representation maps 
for a functional manifold M. of intrinsic dimension d. Let Xj^ be the j th ma- 
nifold mode defined in (2.9) based on representation map ip, and X^ be the 
jth manifold mode using map ip. Then Xj^ = X^ for all aSlR and 1 < j < d, 
if the eigenvalues of Cov (ip(X)) and of Cov [tjj{X)) are of multiplicity one. 

For each X £ A4, given the representation map tp, X can be uniquely 
represented (due to the bijectivity of tp) by a vector # = . . . , dd) £ K rf 

(2.10) X = ip~ 1 (^ + J2$je^ ^ = (V>PQ-/W, j = l,..., d, 
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where n and are denned in (2.3) and (2.8), respectively, (•, •) is the inner 
product in R d and $j are uncorrelated r.v.s with mean and variance Xj. 
We call "Qj the functional manifold components (FMCs) in the representation 
space. 

3. Estimating functional manifolds. Suppose we observe {Yy : 1 < i < n; 
1 < j < riij- which are noise-contaminated measurements made on n inde- 
pendent realizations Xi of a random function X £ M, according to the data 
model 

Yij — Xi (tij ) -|- £ij . 

Here the tij are the time points where the functions are sampled, and 
the Eij S R are i.i.d. errors with mean and variance a 2 . A first task is 
to find an approximation ip to the representation map ip based on the ob- 
served Y^. We also require the inverse ip~ l . Prior knowledge about the data 
may suggest a specific form for i/j [18], or one may have direct observations 
of ip{Xi). But in general, the representation map i/j is unknown and needs 
to be determined from the data. 

3.1. Inferring d- dimensional manifold representations. Following [31], 
we use the pairwise distances between observed data to obtain a map ijj that 
preserves the geodesic distances. Alternative approaches include LLE [28] 
and Laplacian eigenmaps [2]. While these methods have been developed for 
multivariate data, they can be adapted to functional data in a two-step 
procedure as follows. 

In a first step, given an intrinsic dimension d of A4, adopt the pro- 
posal of [31] to obtain the function tp : L 2 — > M. d only at the sample points 
{Xi, . . .,X n }, where Xj G L 2 , by 

n 

(3.1) argmin V {||^pQ) - ^{Xj)\\ - d g {X u Xj)} 2 . 

Here, d g (-,-) is the geodesic distance (2.2) and the minimum is taken over 
the vectors ip(Xi) £l £, ,i = l,...,n, formed by the values of tp on the func- 
tions Xi, that is, the goal is to find n vectors ip{Xi) GR rf ,i = l,...,n, that 
minimize (3.1). For this, one needs to estimate the geodesic distances, and 
then the minimizer ip(Xi) is obtained by multidimensional scaling (MDS) 
based on estimates of d g (Xi,Xj) [6]. Our asymptotic results pertain to a sec- 
ond step, where the assumed smoothness of ip is invoked to obtain global 
estimates for tp, as described in Section 3.2. As for i/j(Xi),i = l,...,n, as 
determined by (3.1), we assume that the minimization in (3.1) provides val- 
ues on or defines the target manifold at the sample points, that is, that 
ip(Xi) = ip(Xi),i = 1, ... ,n, or alternatively, that v n = ip(Xi) — tp(Xi) — > 0. 
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In order to approximate geodesic distances d g (Xi,Xj), we first aim at 
estimates of the L 2 distances \\Xi — Xj\\ L 2. For this purpose, the Karhunen- 
Loeve representation (2.6) can be used to obtain fitted curves, 

(3.2) Xf(t) = £(t) +J2^k(t). 

k<K 

Here, fi(t) and G(t,s) are first obtained by applying local linear one-dimen- 
sional and two-dimensional smoothers to the pooled data; then eigenfunc- 
tions 4>k(t) and eigenvalues X k are extracted by classical vector spectral anal- 
ysis applied to a discretized version of the estimate G(t, s) of the covariance 
surface G(t,s) = Cov(X(t),X(s)); and then the FPCs are approximated 
by discretizing integrals 

rii 

(3.3) ii k = ^{Yij - p.(Uj)}(i>k(tij)(tij - Uj-i) 

3=2 

or alternatively by conditional expectation (for details on these steps, see [34]), 
(3-4) i ik = \ k £ k ±^){Y i -ii l ), 

where ifc = ■ • • , 4>k(ti ni )), ( s yJj7 = G(tij,t u ) + a 2 I, l<j, l<n i} 

fii = (/t(£ji), . . . ,fi(ti ni )), and a 2 is estimated from the difference between 
empirical variances of Yij and G(t,s). The conditioning method (3.4) is 
the only available option if the data are sparsely sampled. To ensure that 
a sufficiently large number of components is included in the truncated ex- 
pansion (3.2), one may choose K by requiring a large fraction of variance 
explained (FVE), that is, 

(3.5) K = mm{k: ^ l - kXl >l-a 

for, say, a = 0.05, where the A; are estimates of the eigenvalues A; in (2.5). 
The resulting L 2 distances are ||Xf - Xf\\ L i = {£feLi(£fc - €jk) 2 } 1/2 - 

Note that alternatively to representation (3.2), one can also directly apply 
local constant or local linear smoothing to obtain smooth trajectories in the 
case of dense and balanced designs, for example, using Nadaraya-Watson 
kernel estimators, 

(3.6) Xi(t) 



where K\ and h\ are smoothing kernel and bandwidth. For the smoothing 
kernel one can use any standard kernel such as the standard Gaussian density 
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function or the Epanechnikov kernel, while in practice h\ may be chosen by 
cross-validation or generalized cross-validation. 

Then the pairwise I? distances are simply \X% — Xj \\ L 2 . We will not explic- 
itly explore this alternative smoothing approach in our theoretical analysis, 
but note that essentially the same results as those reported below hold for 
this alternative approach, by minor extensions of our arguments. In the im- 
plementations (simulation and data analysis), we use both approaches (3.2) 
and (3.6). The estimated random trajectories, obtained though (3.2) or (3.6), 
generally are not lying on the manifold A4, as they are merely approxima- 
tions to the true unknown functions, due to additional noise and discrete 
sampling of the random trajectories. However, these estimates, owing to their 
consistency, will fall inside a small L 2 -neighborhood around A4 . Asymptotic 
properties are discussed in Section 4. 

Since the geodesic is the shortest path connecting points on a manifold, 
one may first connect the points inside small I? neighborhoods and then 
define the path between two far away points by moving along these small 
neighborhoods, and then find the geodesic by the shortest path connecting 
through such neighborhoods. This is essentially the idea of the ISOMAP 
algorithm [31]. The performance of this method however proved somewhat 
unstable in our applications, as functional data typically must be inferred 
from discretized and noisy observations of underlying smooth trajectories 
and therefore do not exactly lie on the manifold, as is assumed in ISOMAP. 

In such situations, due to random scattering of the data around the man- 
ifold, the shortest path found by the ISOMAP criterion may pass through 
"empty areas" outside the proper data cloud. This problem can be effectively 
addressed by modifying the ISOMAP criterion, by additionally penalizing 
against paths that include sections situated within "empty regions" with few 
neighboring data points. Density-penalized geodesies are characterized by se- 
quences of L 2 functions (Wi, W2, . . . , W m ) from the starting point W s = W\ 
to the end point W e = W m of the geodesic, where each of the Wj stands for 
one of the observed functions X{ (with unrelated index), and are defined as 

{m— 1 
V \\W i -W i+1 \\ L 2(l + P s (W i ,W i+1 )): 
i=i 

(3.7) 

|| Wi- Wi+l|| L 2 <£ 

Here the parameter e limits the step length, and the penalty function P$ is 
determined by the density of the data cloud around Wi and Wj+i, 

P s (W u W z+1 )=p-l 1 I(p hl+1 <5), 
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where Pi , i+1 = min{#{W j : \\Wj - W t \\ L 2 < e}, #{Wj : \\W, - W i+1 \\ L * < e}} 
and # denotes the cardinality of a set. By selecting the parameter 6, one 
can control the threshold of the local density of points, below which the 
penalty P$ kicks in. The ISOMAP algorithm corresponds to the special case 
where 5 = 0, P$ = 0. 

The choice S > leads to "penalized ISOMAP" or P-ISOMAP, where the 
penalty parameter 5 may be selected data-adaptively by cross-validation. 
The choice of S and also of the step size parameter e is discussed in Sec- 
tion 3.3. If the manifold is very smooth, a large s and small m will lead 
to a sufficiently good estimate of the geodesic distance. A detailed dis- 
cussion of the convergence of the estimated geodesies in the framework of 
ISOMAP can be found at http://isomap.stanford.edu/BdSLT.pdf. For 
the proposed P-ISOMAP, we implement the minimization of S(W s ,W e ) 
by Dijkstra's algorithm, which selects m and the geodesic paths (W s = 
Wi,W2, • • • , W m -i,W e = W m ). The resulting estimated geodesic distance is 

m— 1 

(3.8) d g (W s ,W e ) = \\Wj - W j+1 \\ L 2, 

0=1 

where Wj = Wj or , depending on which preliminary approximation is 
used for Wj. Once these distances have been determined, an application of 
MDS yields ip(Xi), in the same way as in the standard ISOMAP method. 

3.2. Obtaining the global map and representing sample trajectories. For 
any location 6 £ M. d , we find i^~ 1 (6) by local weighted averaging, that is, 

(3.9) ib~ l (0 =^— i l> U—l 

where k is a (i-dimensional kernel, like the Epanechnikov kernel k(ui,..., 
u d) = (j) d IYk=i{( 1 ~ u l)I(\ u k\ < 1)}, with H = hl dxd for a suitably chosen 
bandwidth h, Xi could be either Xi as in (3.6) or Xf as in (3.2), and ipi^i) 
is defined after (3.8). We use cross-validation to select h (see Section 3.3). 
The asymptotic properties of (3.9) will be discussed in Section 4. 
Specifically, as predictor of Xi, we propose 

f3 1Q) ±M = Y. i #<H-i${X i )-j,{X j )))X j 

borrowing strength from local neighbors in the <i-dimensional representation 
space. This can be seen as an alternative to representation (3.2), where we 
use the FPCs and borrow strength from the whole data set to estimate func- 
tional mean and eigenbasis. As before, we note that (3.10) is not necessarily 
in J\A, but will be in a small neighborhood asymptotically and in compar- 
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ison with (3.2), (3.10) usually proves to be a much better predictor of Xi 
for functional manifold data as shown in the simulations and applications 
in Section 5. Asymptotic properties are discussed in Section 4. 
Definition (2.3) suggests to estimate the manifold mean by 

(311) ~m EMH-^(Xj) - jM))Xi 



5>(ir-i(iKx 4 )-A)) 

where fi = ^ X^i^(^)- The FMC vectors ej defined in (2.8) are estimated 
by eigendecomposition of the sample covariance matrix of ijj(Xi), that is, 



and ej are such that 



(3.12) 






where the Aj^ are ordered to be nonincreasing in j. From (2.9) and (3.9), 
we obtain estimates of the manifold modes as 



(3.13) X 



Ei KjH-HHX*) - A - a(Af ) 1 / 2 e J })X t 



where j = 1, . . . , d and a£K. 

3.3. Selection of auxiliary parameters. We use 10-fold cross-validation 
to simultaneously choose the step size e, the truncation parameter 5, and 
the smoothing bandwidth h (see Sections 3.1 and 3.2). The number of can- 
didates for s and 5 is kept small so that the cross-validation procedure runs 
reasonably fast. Candidates for the step size e are the median distance of the 
5th, the 8th and the 12th nearest neighbor; those for 8 are selected such that 
0%, 2%, 5% and 10% of the data with the lowest local density estimates are 
penalized. Each of 10 subgroups of curves denoted by V±,. . Vio is used as 
a validation set, one at a time, while the remaining data are used as training 
set. 

In an initial step, we use the whole data set and a given e, 5 to deter- 
mine tj)(Xi), followed by estimation of Xi = ^ _1 (i9j) for Xi in the valida- 
tion set, using (3.9) and assuming that only those Xj in the training set 
are known. Denoting the value of the estimated trajectory X{, evaluated 
at time tu, by Xn, the sum of squared prediction errors for the validation 
set V k is SSPE fc = ]Ciev fc Td=i( X U ~ Y u) 2 i where Yu = X^tu) + en is the 
observed value of trajectory Xi at time tij. The cross-validation choice is 

the minimizer of MSPE(e, h, 5) = ^}^™ k . 
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Following [31], the intrinsic dimension d can be chosen by the 1 — (3 frac- 
tion of distances explained (FDE), that is, 

( \\D P — D\\w 1 
(3.14) d = mm\p: " -—. <p\, 

p I \\ D \\F J 

where we choose (3 = 0.05 and D, D p are n by n distance matrixes with Dij = 
dg(Xi,Xj) as in (3.8), = U p (Xi) - $ p (Xj)\\ and where ip p denotes the 
MDS solution (3.1) in MP, and || • \\f is the matrix Frobenius norm, = 
{Eij D ijV /2 ■ Note that \\D p -D\\ F is the square root of the minimized 
value of (3.1). 

4. Asymptotic properties. We provide the specific convergence rate 
of Xf~ , defined in (3.2), under assumptions (Al)-(A5) in the Appendix. 
Note that condition (A3) requires that the random functions are sampled at 
a dense design. Our starting point is that the manifold can be well identified 
at the sample points through ISOMAP, or alternatively, that the ISOMAP 
identified manifold may be viewed as the target. The difference between the 
target and the identified manifold from ISOMAP is quantified by a rate v n 
that is assumed as given; if the target manifold corresponds to the manifold 
as identified at the sample points, we may set v n = 0. The theoretical anal- 
ysis aims to justify the new manifold representations that we propose, and 
for this it is essential to consider the behavior of the estimates across the 
entire function space. Therefore, our theoretical results demonstrate how to 
extend local behavior at the sample points to obtain global consistency of 
the proposed functional manifold representations. 

As the convergence is for K = K n — > oo as n — > oo , the rate of decline of 
the eigenvalues in (2.5) and also lower bounds on the spacing of consecutive 
eigenvalues, as postulated in (A4) are relevant, with a requirement of poly- 
nomially fast declining eigenvalues. Required smoothness and boundedness 
assumptions for X G A4 are as in (A5). 

Proposition 3. Assume (Al)-(A5) in the Appendix, and define r n = 
max{-/=p-, ^ h , ^ hy }- If there are infinitely many nonzero eigenvalues 

in (2.5), which are all of multiplicity one, then for sequences K = K n — > oo, 
subject to r n K a2+l l 2 — > 0, where a<i is a constant such that — A^+i > 
C^fe - " 2 for some C2 > and where K < Kq with Kq = min{i : Aj — Aj+i < 
2D n } - 1 and D n = {J r2 (G(t,s) - G(t,s)) 2 dtds} 1 / 2 where G is defined 
in (2.5) and G is defined after (3.2), it holds that 

(4.1) ||lf - Xi\\v =O p {r n K^ +l l 2 + K-^-^l 2 ) 

for Xf defined in (3.2), where a± is such that A& < C\k~ ax for all k and 
some C\ < 00. 
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We note that under the assumptions, Kq — > oo. The first term on the r.h.s. 
of (4.1) is due to estimation error and the second term is due to truncation 
error. In the special case when there are only finitely many nonzero 
in (2.5), it can be shown that the rate in (4.1) simply becomes O p {r n ). Next 
we discuss the convergence of the estimates that appear in (3.12). 

Proposition 4. Under (Bl) and (B2) in the Appendix, 

(4.2) \\jl-fi\\ = O p (v n + ^ 

where fj, and fi are defined in (2.3) and (3.11), and v n = sup i=1 n \\ijj{Xi 
ip{Xi)\\. If the j th eigenvalue of Cov {ip (X)) is of multiplicity one, then 

(4.3) ||ej - 6j|| = Op[v n + 



(4.4) l XM_ X M l=0p L n + j= 



where Xy^, ej, ej and Xj 4 are defined in (2.8) and (3.12), respectively. 

Theorem 1. Under (A1)-(A5), (Bl), (B2) and (G1)-(C3) in the Appen- 
dix, assume that the density function f of ip{X) £ M. d satisfies f(0) > for 
a specific = i/j( x ) an d that h > is selected such that h — > 0, ra -1 /i~ 2( - d+1 ) — > 
and h~^ d+r) Ev n 0. Then ^{O) defined in (3.9), using Xi = Xf , is 
a consistent estimate o/-0~ 1 (0). Specifically, defining = k>K C 2 ,} 1 ^ 2 
where = j (X — EX)^ and the orthonormal basis {4>k : k > 1} is given 
in (2.5), and defining R K (0) = T% {0)) , where R K {0) ^ as K = 
K n — )• oo, it holds that 

wr\o) - r\o)\\ L * 

(4-5) 

= O p (h 2 + -L= + V f + R K (9) + K a ^/ 2 r n ) , 

where r n , a2 and v n are as in assumptions (A3), (A4) and (Bl). 

Note that Rk{0) corresponds to the truncation error for il)~ 1 (6) £ M. 
The last term K a2+l l 2 r n is due to the estimation error as in Lemma 1. 
The middle term ^ reflects the estimation error of the weights, which is 
influenced by the scale of the bandwidth. The first part h? H — is the 

J Vnh d 

optimal rate when the Xi and tjj are known, reflecting an intrinsically d- 
dimensional smoothing problem. Related findings are discussed in [3]. 
For the manifold modes, we obtain the following corollary. 

Corollary 1. Under the conditions of Theorem 1, for a given a£l 
and 1 < j < d, assume that /(/x + a(A^ /! ) 1 / 2 ej) > and that h is chosen 

as in Theorem 1. Then the estimated manifold modes Xj^ as in (3.13), 
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substituting Xi = Xf , are consistent. Specifically, 

II Y-M _ x^ II 
II j, a ^-j'jCkIIZ/ 2 



(4.6) 

= Jh 2 + -±= + ^ + -±- + R K + K^ +1 /\ A 
where R K = T*(Xf A a ). 

An immediate consequence of these results is that the manifold repre- 
sentation given in (2.10) provides a consistent representation of all random 
functions in the functional manifold. Proofs of all propositions, theorem and 
corollary can be found in the supplementary file [5]. 

5. Examples and simulation study. 

5.1. Functional manifolds and isometry. To illustrate our methods and 
to discuss the impact of the critical isometry assumption, we consider the 
following three example functional manifolds: 

(i) A one-dimensional (d = 1) functional manifold 
•Ml = I X G L 2 ([-4,4]) :X(t) = fi(h a (t)), 



8f' /8+0 - 5 s a (l-s)ds 

h a (t) = ^—, 4,a>-l 

/ V(l-s)ds 

where fi(t) = ^exp{-i(t + 2) 2 } + -^=exp{-2(t - 2) 2 }. This corresponds 
to random warping of a common shape function fi, which has two peaks. The 
time warping function h a is generated from the cumulative Beta distribution 
family and a is a random parameter, a = max(— 1, Z), where Z ~ N(0, 0.09). 
(ii) A two-dimensional (d = 2) functional manifold 

M 2 = (xe L 2 ([-4,A]):X(t) = — =L=exp 



a > 0,(3 £ 



This manifold is a collection of Gaussian densities, corresponding to a shift- 
scale family, where a = max(0, Z), Z ~ N(l, 0.04) and /3 ~ N(0, 1). 
(hi) Another two-dimensional (d = 2) functional manifold 

M 3 = lie L 2 ([-4,4]):X(t) = ^expi At-0.8-a) 2 



v 7 ^ I 2 

-= exp{-(t + 0.8 - /3) 2 }, a, € 
V 71 " 
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Fig. 1. Manifolds M\-M-i- Top left panel: functions on Mi for a = 0.6,0.8,1.0,1.2,1 .4. 
Top right panel: corresponding identity-subtracted warping functions h a (t) —t. Middle left 
panel: functions on M2 for a = 0.4, 0.7, 1.0, 1.3, 1.6 and f3 = 0. Middle right panel: func- 
tions on M2 for f3 — 0.4, 0.7, 1.0, 1.3, 1.6 and a = 0. Bottom left panel: functions on Ms for 
a = —2, —1, 0, 1, 2 and f3 — 0. Bottom right panel: functions on M3 for f3 = —2, —1, 0, 1, 2 
and a = 0. 



a mixture of two peaks with randomly varying centers, where a ~ N(0, 1) 
and f3 ~ N(0, 1). Note that the two peaks will merge to a larger peak when 
their locations are close, so this set of functions has a randomly varying 
number of peaks. 

Functional manifolds M 1-.M3 are illustrated in Figure 1. We note that M± 
is an isometric manifold and M2 is approximately isometric, while M3 is 
not isometric. This can be seen as follows. For functions X G L 2 on a differ- 
entiate isometric manifold with representation X = ip~ 1 (6i, . . . ,6^), using 

the definition of isometry given after (2.1), the condition f g0 k ||f^-0)|| i2 dJd k = 

9f. — 6fcfork = l,...,d and any 9%, 9\ E M. is equivalent to isometry. Therefore, 
the existence of a parametrization of the map tp for which the L 2 norms of the 
partial derivatives of X with respect to the parameter components are con- 
stant is sufficient and necessary for tp to be isometric. For one-dimensional 
manifolds such as M\, one can always find such a parametrization, as long 
as X is differentiable in the parameter and the derivative is L 2 integrable in t. 
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Table 1 

Fraction of distances explained (3.14) for isometric manifold fits with different 
dimension d (other parameters are optimized), for two signal-to-noise ratios R 



Manifold 


R 






d 






1 


2 


3 


4 


5 


Mi 


0.1 


0.998 


0.999 


0.999 


0.999 


0.999 




0.5 


0.9778 


0.993 


0.995 


0.995 


0.996 


M 2 


0.1 


0.914 


0.988 


0.994 


0.996 


0.996 




0.5 


0.902 


0.971 


0.974 


0.978 


0.980 


M 3 


0.1 


0.699 


0.932 


0.957 


0.977 


0.980 




0.5 


0.639 


0.906 


0.948 


0.955 


0.958 


Growth 




0.947 


0.972 


0.980 


0.985 


0.988 


Yeast 




0.891 


0.949 


0.981 


0.983 


0.984 


Mortality 




0.878 


0.954 


0.973 


0.980 


0.982 



For A^2j such a parametrization does not exist, but since ||^-(t)||z,2 = —c\ 

and 1 1 ^wit) | |l 2 = q c 2 f° r constants c±, C2 and as a is chosen to remain very 
close to 1, the natural parametrization approximately satisfies the condition 
for isometry. In contrast to Ai± and M2, the functional manifold .M3 is non- 
isometric and we include it as an example how the proposed methodology 
is faring when the key assumption of isometry is violated. As our consider- 
ations take place in a manifold learning framework, where the underlying 
manifold is unknown, an interesting aspect is to devise a data-based check 
to gauge the degree to which the isometry assumption can be expected to 
be satisfied. A natural metric for such a check is the fraction of distances 
explained (FDE), defined in (3.14). This criterion quantifies the percentage 
of geodesic distance that is preserved when fitting a d-dimensional isometric 
manifold to the data. For cases where the underlying manifold is actually 
nonisomorphic, the fitted manifold is an isometric approximation to the true 
underlying manifold, obtained by minimizing the stress function in the MDS 
algorithm. 

An informal goodness-of-fit criterion for isometry is to require FDE to 
be larger than 95%, and choosing the manifold with the smallest dimension 
that satisfies this criterion. In Table 1, values for FDE obtained for the 
simulated data for manifolds under two signal-to- noise ratios R 

(defined in the following subsection) are reported, with dimension d ranging 
from 1 to 5. The well-known fact that the stress function declines when 
the dimension of the projection space is increased underlies the traditional 
MDS-Scree Plot [6] and is reflected by the observed increase in the values 
for FDE as dimension increases. 

Applying the above check for isometry, we find that indeed the dimen- 
sions of the isometric manifold Ai\ and the near- isometric manifold 2 are 
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correctly selected, while the first two dimensions of the isometric manifold 
approximation to the nonisometric manifold M3 are not sufficient. Thus, 
the nonisometric nature of M3 means that the dimension of the underlying 
functional manifold cannot be correctly identified and instead the proposed 
algorithm will find a higher-dimensional isometric manifold to represent 
The price to pay for a suitable isometric approximation is increased dimen- 
sionality, which in this example ends up larger than 2 for the approximating 
isometric manifold. We note that an approximating isometric manifold can 
always be found, since the linear and therefore intrinsically isometric man- 
ifold of infinite dimensionality that is spanned by the eigenfunction basis 
contains the random functions of the sample, according to the Karhunen- 
Loeve theorem, and is always applicable. 

While we can always find a near-isometric manifold of large enough di- 
mensionality with the proposed algorithm, when the data lie on a lower- 
dimensional nonisometric manifold, these approximating isometric mani- 
folds may not be efficient, since they do not provide the lowest-dimensional 
possible description of the data. Nevertheless, an approximating isometric 
nonlinear manifold obtained by the proposed approach often will present 
a much improved and lower-dimensional description when compared to the 
alternative of classical linear basis representation. This is exemplified by the 
functional nonisometric manifold M3, which in the following subsection is 
shown to be much better represented by an isometric manifold than by a lin- 
ear basis. So the price that the isometry assumption exacts in nonisometric 
situations is that the proposed approach leads to a more or less suboptimal 
representation, which however will often be substantially lower-dimensional 
than an equally adequate linear representation. We conclude that even in 
nonisometric situations the proposed approach can often be expected to lead 
to improved representations of functional data. 

5.2. Simulation results. We simulate functional data from manifolds Mi— 
M3 as introduced in the previous subsection, aiming to study two questions. 
First, when the functional data lie on a manifold, whether it is isometric or 
not, does the proposed functional manifold approach lead to better (more 
parsimonious, better interpretable) representations of the data, compared to 
functional principal component analysis? Second, for noisy functional data 
that do not exactly lie on a manifold, how much improvement may one 
gain by adding the data-adaptive penalties implemented by P-ISOMAP, as 
described in Section 3.1? 

For these simulations, the actual error-contaminated observations of the 
functional trajectories are generated as Y$j = JQ(iy) + e^, £y ~ iV(0,<7 2 ) 
i.i.d., i = 1, . . . ,n, j = 1, . . . , m, where n = 200, tij equally spaced in [—4, 4] 
with 30 observations per trajectory, and the noise variance a 2 is such that the 
signal-to-noise ratio R is 0.1 or 0.5. We estimated manifold means fi M (2.3), 
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Fig. 2. Simulated data for manifold Adi. Here and in the following figures, color de- 
scriptions refer to the online version of the paper. Top left panel: five randomly selected 
curves. Top right panel: common shape function (solid red, corresponds to target mean), 
estimated manifold mean jX (3.11) (dash-dot black) and the L 2 mean (dashed blue). Sec- 
ond row: scatter plot of second versus first functional principal component (left) and second 
versus first functional manifold component (right), where the bold black dot represents the 
manifold mean and the blue cross dot represents the L 2 mean. Third row: estimates of 
principal component based mode Xi, a (2.7) (left) and of manifold mode X^ a (2.9) (right) 
of functional variation for a — —2,-1,0,1,2. Bottom row: two randomly selected curves 
(solid red), with the corresponding principal component based predictions (3.2) (dashed 
blue), and manifold based predictions Xf 4 (3.10) (dash-dot black) for L — d = 2. 



manifold modes of functional variation Xj^ (2.9) and obtained predic- 
tions Xf 4 (3.10), which were compared with predictions obtained by func- 
tional principal component analysis. 

Results for a simulation run are shown in Figures 2, 3 and 4 for manifolds 
M1-M3, respectively. The estimated manifold means are seen to be close 
to the corresponding intrinsic means, that is, the common shape function 
for manifold A4\, the standard Gaussian density for manifold M2 and the 
curve with no time shifts (a = /3 = 0) for manifold M3. On the other hand, 
the cross-sectional means are seen to be far away from these intrinsic means 
and therefore clearly are not useful as measures of location for these sets of 
functions. 



18 



D. CHEN AND H.-G. MULLER 




0.5 r 

o 

-0.5 



-0.5 



0.5 




1 r 
0.5 ■ 




0.5 r 

o ■ 

-0.5 

0.6 
0.4 
0.2 ■ 


-0.2 



-0.5 



So a °e "9 n 



0.5 




0.6 






0.4 






0.2 













-0.2 







Fig. 3. Simulated data for manifold Mi. Top left panel: five randomly selected curves. 
Top right panel: standard Gaussian density (solid red, corresponds to target mean), esti- 
mated manifold mean \± M (3.11) (dash-dot black) and the L 2 mean (dashed blue). Second 
row: scatter plot of second versus first FPC (left) and second versus first FMC (right), 
where the bold black dot represents the manifold mean and the blue cross represents the L 2 
mean. Third row: estimates of principal component based mode Xi, a (2.7) (left) and of 
manifold mode X{^ (2.9) (right) of functional variation for a = — 2, — 1, 0, 1, 2. Fourth 
row: estimates of X^, a (left) o-nd of X§% (right) for a — —2,-1,0,1,2. Bottom row: two 
randomly selected curves (solid red), with the corresponding principal component based 
predictions Xf (3.2) (dash blue), and manifold based predictions X^ 4 (3.10) (dash-dot 
black) for L = d = 3. 



The scatter plots of second versus first FPC indicate "horseshoe" shapes 
for manifolds A4\ and M.2- This diagnostic indicates that a functional man- 
ifold approach may be called for. We find that the location of the cross- 
sectional mean (at the origin, due to the zero expectation property of FPCs) 
typically lies in a relatively sparse region of the data in these scatter plots, 
while the manifold mean falls into a much denser area, which is another diag- 
nostic feature pointing to an underlying manifold. Complex two-dimensional 
surface curvature is observed for manifold M3. Comparing with Figure 1, 
we find that the manifold modes represent the inherent components of func- 
tional variation present in the data quite well, while the established princi- 
pal component based modes are not informative in describing the functional 
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Fig. 4. Simulated data for manifold Ms- Top left panel: five randomly selected curves. 
Top right panel: curve with no time shifts (solid red, corresponds to target mean), esti- 
mated manifold mean p, (3.11) (dash-dot black) and the L 2 mean (dash blue). Second 
row: contour scatter plot of second versus first FPC (left) and second versus first FMC 
(right), with the colors scaled from the third FPC or FMC. Third row: estimates of prin- 
cipal component based mode Xl, q (2.7) (left) and of manifold mode X{^ a (2.9) (right) 
of functional variation for a = — 2, — 1, 0, 1, 2. Fourth row: estimates of Xi, a (left) and 
of X^ a (right) for a = -2, -1, 0, 1, 2. Bottom row: two randomly selected curves (solid 
red), with the corresponding principal component based predictions (3.2) (dash blue), 
and manifold based predictions Xf 4 (3.10) (dash-dot black) for L — d — 3. 



variation. It is also obvious that the proposed predictions for individual 
trajectories Xi are more accurate in capturing amplitudes and locations of 
peaks. 

Leave-one-out predictions of the Xi are calculated using both functional 
principal components (3.2), resulting in X^, as well as the proposed new 
estimates Jtf 4 (3.10). For X[, we estimate the FPCs (2.6) of Xi using all 
data and then leave Xi out to obtain ft and for X^ 1 , we estimate ^(Xi) 
using all data and then leave Xi out in the local averaging step. Starting with 
L = l,d = 1, we increase L and d successively, obtaining the mean squared 
prediction errors MSPE = ^ E?=°i \\ x i ~ x i\\h, where X { = Xf or Xf 4 , 
for 1< d = L < 5. 



20 D. CHEN AND H.-G. MULLER 

Table 2 

Mean squared prediction errors and relative squared prediction errors for M1-M3 



MSPE with lord RSPE with Lord (%) 





R 


Method 


1 


2 


3 


4 


5 


1 


2 


3 


4 


5 


Mi 


0.1 


xt 


0.159 


0.034 


0.025 


0.021 


0.021 


41 


10 


6 


6 


6 






xr 


0.027 


0.015 


0.015 


0.014 


0.015 


7 


4 


4 


4 


4 




0.5 


xt 


0.173 


0.061 


0.057 


0.058 


0.058 


45 


16 


15 


15 


15 






xt 4 


0.090 


0.046 


0.046 


0.049 


0.053 


23 


12 


12 


13 


14 


M 2 


0.1 


xt 


0.054 


0.022 


0.013 


0.008 


0.007 


44 


17 


10 


7 


6 






xy 


0.022 


0.009 


0.007 


0.006 


0.006 


18 


8 


5 


5 


5 




0.5 


xt 


0.055 


0.025 


0.019 


0.018 


0.018 


45 


20 


16 


14 


14 






xt 4 


0.030 


0.017 


0.015 


0.014 


0.014 


25 


13 


12 


12 


12 


M 3 


0.1 


xt 


0.148 


0.059 


0.031 


0.023 


0.020 


59 


23 


13 


9 


7 






xt 4 


0.088 


0.025 


0.020 


0.020 


0.019 


35 


10 


8 


8 


8 




0.5 


xt 


0.154 


0.071 


0.053 


0.048 


0.048 


61 


28 


21 


19 


19 






xt 4 


0.124 


0.059 


0.047 


0.045 


0.044 


49 


24 


19 


18 


18 



The simulation results for manifolds M1-M3 are shown in Table 2. Gen- 
erally, the MSPE is reduced by 20% over the established linear method when 
using the manifold approach; this improvement exceeds 50% when L and d 
are small. Another metric of interest is the relative squared prediction er- 
ror of the model over the squared error when using the mean as predictor, 

RSPE = ',7)0 ' — *; p 2 . where X = ^ Xi, which can be interpreted 

2^ =1 ||JCi-X||2 2 ^ 1-1 

as fraction of variance that is left unexplained. In all three simulated mani- 
folds, RSPE is found to be much larger for the functional principal compo- 
nent representations, when the same number of components is used. This is 
because in the inefficient linear representation higher order functional prin- 
cipal components carry substantial variation. 

To quantify the efficiency of the data-adaptive penalties in the proposed 
P-ISOMAP procedure, we also calculated the MSPE using the unmodified 
ISOMAP. Parameters for ISOMAP were selected analogously to the descrip- 
tion in Section 3.3 by cross-validation. Since the most important comparison 
is for the case where d equals the intrinsic dimension, that is, 1 for Ai\ and 
2 for M 2 and M 3 , we calculated the ratio of the MSPE of P-ISOMAP 
over the MSPE of ISOMAP for these situations (Table 3). As anticipated, 
P-ISOMAP indeed exhibits increasing benefits for smaller signal-to-noise 
ratios. 

The influence of the selection of the step size parameter e in P-ISOMAP, 
defined in (3.7), on mean squared prediction errors is demonstrated in Ta- 
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Table 3 
Mean squared prediction error 





ratios for P-ISOMAP 


over 




ISO MAP 




R 


M.i A4.2 


M-3 


0.1 


0.9676 0.9679 


0.9402 


0.5 


0.8121 0.8879 


0.8302 



ble 4. Here d is fixed as the intrinsic dimension (1 for Ai\ and 2 for M2, 
.M3), while 5 and h are optimized by cross-validation for each e. We then 
select e from the median distances of the 3rd, 5th, 8th, 12th and 16th near- 
est points calculated over all sample data. From the results in the table, one 
finds that the results are not strongly sensitive to the selection of e, as long 
as it is in medium range. A good overall choice is median distance of 8th 
nearest neighbors. When e is chosen very small, some sample points that 
are not situated close to other sample points may become separated from 
the other data, or disconnected subgroups in the data may emerge, which 
renders the MSPE for small e inaccurate. In practice, we therefore impose 
a lower bound on e to ensure that the fraction of data that are not connected 
to other points when connecting through ^-neighborhoods stays below 5%. 

6. Applications. 

6.1. Berkeley growth study. In growth studies, one often observes phase 
variation in the trajectories. Some subjects reach certain growth stages (such 
as puberty in human growth) earlier than others. This leads to difficulties 
for the parsimonious modeling of growth patterns with linear methods, and 
more generally for methods that are based on L 2 distance between trajec- 
tories. Accordingly, cross-sectional mean estimation tends to fail in repre- 

Table 4 

Mean squared prediction errors using different e for P-ISOMAP 



£ 



Manifold 


R 


3 


5 


8 


12 


16 


Mi 


0.1 


0.029 


0.027 


0.031 


0.029 


0.033 




0.5 


0.116 


0.102 


0.090 


0.129 


0.135 


M 2 


0.1 


0.008 


0.010 


0.009 


0.010 


0.010 




0.5 


0.020 


0.018 


0.018 


0.017 


0.017 


Mz 


0.1 


0.029 


0.040 


0.025 


0.27 


0.033 




0.5 


0.052 


0.059 


0.065 


0.059 


0.066 
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Fig. 5. Berkeley growth data for girls. Top left panel: derivatives with the cross-sectional 
mean (dash blue) and estimated manifold mean fi M (3-11) (dash-dot black). Top right 
panel: scatter plot of second versus first FPC, where the bold black dot represents the 
manifold mean and the blue cross represents the cross-sectional mean. Second row left 
panel: scatter plot of second versus first FMC, where the bold black dot represents the 
manifold mean. Second row right panel and third row panels: three randomly selected curves 
(solid red), with the corresponding principal component based predictions X^ (3.2) (dash 
blue), and manifold based predictions X^ 4 (3.10) (dash-dot black) for L — d = 2. Bottom 
panels: estimates of principal component based mode Xi, a (2.7) (left) and of manifold 
mode X^^ (2.9) (right) of functional variation for a = — 2, —1,0,1,2. 

senting important growth features adequately [13, 22]. Since phase variation 
introduces nonlinear features in functional data, it is of interest to deter- 
mine whether the analysis of growth data may benefit from the manifold 
approach. 

We apply the manifold approach to the Berkeley growth data for fe- 
males [32]. The data contain height measurements for 54 girls, with 31 mea- 
surements taken between the ages of 1 and 18 years. Interest usually focuses 
on growth velocity [11], which we obtain by smoothing the first-order differ- 
ence quotients of the curves. The resulting growth velocity curves are shown 
in the top left panel of Figure 5, together with the cross-sectional mean and 
the estimated manifold mean fi M (3.11). Similarly to Figures 2-4, the de- 
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Fig. 6. Yeast cell cycle gene expression data. Top panel: all trajectories in different 
colors according to cluster membership: Gl (solid red), S (dash-dot cyan), G2/M (dash 
green), M/Gl (dotted blue) and S/G2 (solid black). Middle left panel: estimated mani- 
fold mean j± M (3.11) (dash-dot black) and cross-sectional mean (dash blue). Middle right 
panel: scatter plot of second versus first FPC, where the blue cross indicates cross-sectional 
mean and the bold black dot indicates manifold mean. Bottom panels: estimates of prin- 
cipal component based mode Xi, a (2.7) (left) and of manifold mode X^ a (2.9) (right) of 
functional variation for a = —2, —1, 0, 1, 2. 

scriptions of Figures 5-7 refer to the color online versions. The location of 
the cross-sectional mean, which falls at (0,0), and the location of the esti- 
mated manifold mean are indicated in the scatter plot of second versus first 
FPC (top right panel), which displays the "horseshoe" pattern described 
above. This, and the fact that the cross-sectional mean is away from the 
main data cloud, point to inherent nonlinearity in these data. 

Mean squared prediction errors (MSPE) and relative squared prediction 
errors (RSPE) for the leave-one-out predictions of Xi, as described in Sec- 
tion 5, are listed in Table 5. The fractions of distance explained (FDE), de- 
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Fig. 7. Human mortality data. Top left panel: death rates for five randomly selected 
countries. Top right panel: estimates of cross-sectional mean (dash blue) and manifold 
mean fi (3.3) (dash-dot black). Second row: scatter plots of second versus first FPC (left) 
and second versus first FMC (right), where the blue cross indicates the cross-sectional mean 
and the bold black dot indicates the manifold mean. Third row: two randomly selected curves 
(solid red), with the corresponding principal component based predictions X i (3.2) (dash 
blue), and manifold based predictions X^ 4 (3.10) (dash-dot black) for L — d = 3. Bottom 
panels: estimates of principal component based mode Xi tCt (2.7) (left) and of manifold 
mode X^ (2.9) (right) of functional variation for a = — 2, —1,0,1,2. 

fined in (3.14), for different dimensions d are shown in Table 1. The MSPE 
of Xj" is minimized at L = 5, with L = 2 already a quite good choice. 

We find that X^ consistently improves upon Xj", the fit obtained from 
functional principal components. Note that we used the preliminary estima- 
tor Xf~ in (3.10) with K = 4, applying criterion (3.5). The FDE criterion 
indicates that these data can be well described by a one-dimensional man- 
ifold. The middle three panels of Figure 5 include three randomly selected 
curves, along with the predictions Xf 1 and X^ using L = d = 2. The two 
bottom panels of Figure 5 illustrate the comparison of estimated manifold 
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Table 5 

Mean squared prediction errors and relative squared prediction errors for growth, yeast 

and mortality data 



MSPE with Lord RSPE with Lord (%) 



Data 


Method 


i 


2 


3 


4 


5 


1 


2 


3 


4 


5 


Growth 




17.1 


12.9 


13.8 


13.7 


12.6 


62 


47 


50 


50 


46 




xy 


10.7 


9.46 


9.06 


9.21 


9.08 


39 


34 


33 


33 


33 


Yeast 


xt 


0.639 


0.382 


0.257 


0.205 


0.203 


67 


40 


27 


22 


21 




xy 


0.468 


0.278 


0.231 


0.210 


0.206 


49 


29 


24 


22 


22 


Mortality 


xt 


7.38 


6.34 


5.44 


5.48 


5.21 


54 


47 


40 


40 


38 




xy 


6.77 


5.64 


5.40 


5.26 


4.98 


50 


41 


40 


39 


37 



modes of functional variation with the principal component based modes. 
The manifold modes are clearly more useful and adequately reflect the time- 
warping feature of these data. The first manifold mode specifically suggests 
that for girls, a puberty growth peak at a late age, especially after age 12, 
tends to have a smaller amplitude; this is in line with auxological knowl- 
edge. Overall, the manifold mode is seen to provide a clearer and much 
more adequate description of the longitudinal dynamics of these data. 

6.2. Yeast cell cycle gene expression. Temporal expression curves for 
yeast cell cycle related genes were obtained by [29]. There are 6,178 genes 
in total, where each gene expression time-course consists of 18 data points, 
measured every 7 minutes between and 119 minutes. Groups of genes are 
thought to be coexpressed coherently across different time periods, accord- 
ing to the role played by the genes in the time progression of the cell cycle. 
The dynamics of the gene expression levels are complex. Temporal regu- 
larization of gene expression is a characteristic of gene function, suggesting 
models that incorporate time- warping [24, 30]. 

The data we study consist of 90 genes that have been identified by bi- 
ological methods [29]. Of these genes, 44 are thought to be related to Gl 
phase regulation of the yeast cell cycle and 46 to non-Gl phase regulation (S, 
S/G2, G2/M and M/Gl phases). Time courses of gene expression (top panel 
of Figure 6) for these clusters reveal two peaks for the Gl (solid red) and 
S (dash-dot cyan) groups, and one peak for G2/M (dash green) and M/Gl 
(dotted blue) groups, while the trajectories for the S/G2 (solid black) group 
are highly variable with no obvious peak. 

The proposed manifold analysis was applied to this set of 90 genes. The 
estimated manifold mean fi M (3.11) (middle left panel of Figure 6) is seen to 
fall within the Gl group (solid red in the top panel). In contrast, the cross- 
sectional mean is almost flat and does not reflect useful information about 
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these data. We also calculated the MSPE (Table 5) of X^ (3.2) and Xf 4 
(3.10), using preliminary estimators Xf- with K = 4 in (3.10). The manifold- 
based predictions are seen to be much better for d = 1 and 2, while they 
become more similar in performance to X^ when d increases. 

In the two bottom panels of Figure 6, we display the estimated manifold 
mode (right) and the principal component based mode of functional vari- 
ation (left). The latter is found to be deceptive, as it indicates amplitude 
variation around a few fixed "knots," while the first manifold mode clearly 
illustrates the actual temporal variation in the data, which is mainly caused 
by phase shifts. Each of the five groups, except the S/G2 group (solid black), 
is well represented by the variation across this manifold mode. 

6.3. Human mortality across countries. The death rates derived from 
current lifetable cohorts for 44 countries in the year 2000, recorded for each 
age ranging from to 110, have been collected and are as described in 
http://www.lifetable.de/. Death rates are widely used for descriptive 
and analytical purposes in public health, and cross-country comparisons are 
of particular interest here. 

We view log-transformed annual death rates as noisy measurements of 
underlying smooth trajectories. Five sample trajectories are shown in the 
top left panel of Figure 7. The mortality trajectories are densely sampled, 
but the annual rates are quite noisy. We presmoothed this data, follow- 
ing (3.6). The resulting MSPEs for Jtf 4 (3.10) and X[ (3.2) are in Table 5. 
Manifold-based prediction is seen to perform better than linear principal 
component based prediction, regardless of the choice of dimension. This is 
also illustrated by the panels in the third row of Figure 7, where predicted 
trajectories are obtained for L = d = 3. For these data, the estimated man- 
ifold mean [i (3.11) does not differ dramatically from the cross-sectional 
mean (top right panel and second row left panel). However, the first man- 
ifold mode of variation (bottom right panel) indicates that countries with 
overall lower death rates, or more specifically, with death rates below the 
mean curve (solid red), exhibit less variation than those with death rates 
above the mean, especially for ages from to 40. This finding is in line with 
the skewness that is apparent in the scatter plots, but is not seen in the 
principal component based mode (bottom left panel). The observed gains in 
prediction error for the manifold approach provide evidence that substantial 
nonlinearity is present in these data. 

7. Discussion. While the proposed functional manifold implementations 
were running relatively fast on a linux server, observing that the computa- 
tional complexity of classical MDS is of the order 0(n 3 ), computational diffi- 
culties may arise for truly large sample sizes n. In such situations, one might 
consider to base the proposed methods on landmark MDS [7], where one em- 
ploys landmarks to significantly reduce the computational complexity. 
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The proposed method relies on two major assumptions: the isometry of 
the underlying functional manifold and that the target manifold is close or 
identical to the manifold identified by ISOMAP at the sample points. As for 
the isometry assumption, even if it is violated, the proposed method proves 
to be beneficial, as it often will provide for a much sparser representation 
of functional data in comparison with linear methods in cases where the 
underlying manifold is nonlinear, even if this manifold is not isometric. This 
is discussed in detail in Section 5.1 and borne out by simulations. As for 
the closeness of the ISOMAP solution to the true manifold at the sample 
points, this assumption and its underlying justification pertains to ISOMAP 
for vector data as proposed in [31]. 

Starting from the simplifying assumption that the ISOMAP identified 
manifold and the target manifold are essentially identical at the sample 
points, we proceed to extend the estimation of the manifold function to the 
entire space of interest. We note that such simplifying assumptions are often 
beneficial when deploying complex statistical methodology, as even when the 
assumptions are not completely satisfied, the resulting methodology may 
turn out to be more efficient than existing methods. 

Overall, we find that the proposed manifold mean and manifold modes 
of functional variation provide useful representations that are competitive 
with and often superior over classical linear representations for functional 
data. The proposed functional manifold representations thus complement 
the established linear representations, notably the Karhunen-Loeve repre- 
sentation, and in many instances provide more efficient models with better 
interpretations. 

APPENDIX: ASSUMPTIONS 

(Al) The bandwidths h^, h v , Kq for estimating fi(t), a 2 , G(t,s) in Sec- 
tion 3.1 satisfy: — > 0, nh^ — > oo and nh® < oo; ha — > 0, ntiQ — > oo and 
nh,Q < oo; hy — > 0, nhy — > oo and nhy < oo. 

(A2) The smoothing kernels for the mean function fi and kq for 
the covariance function G in Section 3.1 are absolutely integrable, that is, 
J \Kfj,(t)\ dt < oo and f f \nc(t, s)\ dtds < oo. 

(A3) For Tij = tij — ti j—i and r* = maxj Ty, it holds that r* = O p (r^), 
where r n = max{ A-> , A , J, }. 

(A4) The eigenvalues of the covariance function G(t, s) satisfy < C\hT a 
for some constants C\ < oo, a\ > 1, and if Afe > 0, then — A^ + i > C^A; - " 2 
for some constants C*2 > and «2 > 0. 

(A5) For any X eM, X is differentiate and \\XWoo = O p (l), \\X'\\oo = 
O p (l). The covariance function G(t, s) is twice differentiate in both t and s, 

and supj sg7 - \G(t, s)\ < C3, sup t sg7 - 1 9 g t Q' s s ^ \ < C4 for some constants C3, 
C4 < 00. 
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(Bl) The estimates if) of tp converge uniformly on the sample space, that 
is, Ev n -> for v n = sup i=lt ^ tn \\tp(Xi) - ip(Xi)\\. 

(B2) Each component of the d-vector ip{X) has a finite fourth moment, 
and its covariance matrix is positive definite. 

(CI) The d-vector ip(X) admits a density function /, which is twice differ- 
entiable with continuous partial derivatives and uniformly bounded Hessian 
matrix. 

(C2) The d-dimensional nonnegative kernel k satisfies j n(u)du = 1, 
k(u) = «(— u), det(f K(u)uu r du) < oo, f K 2 (u)du < oo, and is Lipschitz 
continuous with compact support, {uGl 1 *: ||u|| < 1}. 

(C3) The map if) -1 :M. d — > L 2 is twice Frechet differentiable, that is, there 
exist bounded linear operators : M. d — > L 2 , A^f'xlR^ L 2 such that 

lim l^-Hu + uQ-^^H-^K)!!^ = Q) 

ui->0 || u l|| 
^u+u 2 (ui) " ^u+u 2 (ui) " AI(U U U 2 )\\ L 2 n 



lim 

u 2 -s>0 || u 2|| 

for all u,ui,U2 € In addition, ^"u"!.'"^^ 2 is continuous and uniformly 
bounded w.r.t. u. 
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SUPPLEMENTARY MATERIAL 

Supplement to "Nonlinear manifold representations for functional data" 

(DOI: 10.1214/11-AOS936SUPP; .pdf). An online supplementary file con- 
tains the detailed proofs for Propositions 1-4, Theorem 1 and Corollary 1. 
These proofs make use of material in references [15, 20, 25, 26]. 
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